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Abstract 



During the last decade the field of speech recognition has used the theory of hidden 
Markov models (HMMs) with great success. At the same time there is now a wide 
perception in the speech research community that new ideas are needed to continue 
improvements in performance. This report represents a small contribution in this ef- 
fort. We explore an alternative acoustic modeling approach based on Factorial Hidden 
Markov Models (FHMMs). These are presented as possible extensions to HMMs. We 
show results for phonetic classification experiments using the phonetically balanced 
TIMIT database which compare the performance of FHMMs with HMMs and parallel 
HMMs. 
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1 Introduction 

In recent years hidden Markov models have become the dominant technology in speech 
recognition. HMMs provide a very useful paradigm to model the dynamics of speech 
signals. They provide a solid mathematical formulation for the problem of learning 
HMM parameters from speech observations. Furthermore, efficient and fast algorithms 
exist for the problem of computing the most likely model given a sequence of observa- 
tions. 

Due to this success, there has recently been some interest in exploring possible 
extensions to HMMs. These include factorial HMMs [Ghahramani and Jordan, 1996] 
and coupled HMMs [Brand, 1997] among others. In this report we explore factorial 
HMMs. These were first introduced by Ghahramani [Ghahramani and Jordan, 1996] 
and attempt to extend HMMs by allowing the modeling of several stochastic random 
processes loosely coupled. Factorial HMMs can be seen as both an extension to HMMs 
or as a modeling technique in the Bayesian Belief Networks [Russell and Norvig, 1995] 
domain. In this report we choose to approach them as extensions to HMMs. 

The report is organized as follows. We start by describing the basic theory of 
HMMs and then follow by presenting FHMMs as extensions of these. We continue 
by presenting an extension to the traditional HMM Baum- Welch learning algorithm 
applied to FHMMs. We describe then several experiments designed to compare their 
performance with traditional HMMs. We end this report with our conclusions and 
suggestions for future work. 

2 Factorial Hidden Markov Models 

Factorial HMMs were first described in [Ghahramani and Jordan, 1 996] . In his original 
work Ghahramani presents FHMMs and introduces several methods to efficiently learn 
their parameters. Our focus, however, is on studying the applicability of FHMMs to 
speech modeling. Our goal is to study FHMMs as a viable replacement for HMMs. 

To this end, we have made an effort to explain FHMMs as extensions of HMMs, 
making connections between these two techniques when possible. We assume the 
reader is somewhat familiar with HMM theory. 

2.1 Model Description 

The description requires us to first briefly introduce hidden Markov models. These 
models are the dominant technology used for speech recognition. Tractable, well un- 
derstood training and testing algorithms exist to estimate the model parameters and 
evaluate the likelihood of alternative speech utterances. Their main strength lies in 
their ability to capture the dynamic information in the speech signal. They are able to 
model dynamic patterns, i.e., patterns of variable length. This is important because for 
example the same phoneme when uttered by the same speaker can vary in length. 

Hidden Markov models are probabilistic models which describe a sequence of 
acoustic observation vectors Y = {Y t : t = 1, . . . , T}. The random process gen- 
erating the observation is modeled as being in one of K states. The states are not 
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observable hence the "hidden" nature of the model. Each state can be thought of as 
representing particular speech patterns or regions. 

The parameters of the HMM are the probability density functions (pdf) describing 
the statistics of the acoustic vectors being produced or generated by each of the states, 
and the transition probabilities modeling the likelihoods of evolving from one state to 
another. For a first order HMM, this transition probability depends only on the current 
state. 

The probability that an observation Y is generated given the model is expressed as 
follows 

T 

p(Y\X) = Y,n(S 1 )p(Y 1 \S 1 )TlP(S i \S i - 1 )p(Y i \S t ) (1) 



t=2 



Here: 



Y = a sequence of N dimensional vector observations {Y t , t = 1, . . . , T} 
S = a sequence of states {S t , t = 1, . . . , T} 
P(S t \S t -i) = transition probability from state S t -i to state S t 
n(<f)i) = the probability of being in state Si at time t = 1 
p(Y t \St) = pdf of the observation vector Y t given the state St 
typically modeled as a mixture of Gaussians 
K = the number of states in the model 
A = the model parameters = {K, {P(S t \S t -i)}, {p(Y t \S t )}} 

In the speech community a HMM is typically represented as shown in Figure 1. 
Here each state is shown explicitly and the arrows show allowable transitions be- 
tween states. However a HMM can also be represented as a dynamic belief network 
[Russell and Norvig, 1995] as shown in Figure 2. This alternative representation shows 
the evolution of the state sequence with time since each node represents the state at each 
time slice. This context switch to dynamic belief networks shows the manner in which 
HMMs can be generalized to FHMMs. 

The factorial HMM arises by forming a dynamic belief network composed of sev- 
eral "layers". This is shown in Figure 3. We see here that each layer has independent 
dynamics but that the observation vector depends upon the current state in each of the 
layers. This is achieved by allowing the state variable in Equation 1 to be composed 
of a collection of states. That is, we now have a "meta-state" variable S t which is 
composed of M states as follows 

St=S?\...,sW (2) 

Here the superscript is the layer index with M being the number of layers. The layer 
nature of the model arises by restricting transitions between the states in different lay- 
ers. Were we to allow unrestricted transitions between states in different layers we 
would simply have a regular HMM with a K M xK M transition matrix. Intermediate 
architectures in which some limited transitions between states in different layers are 
allowed have also been presented in [Brand, 1997]. 
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Figure 2: Dynamic Belief Network representation of a Hidden Markov Model 
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Layer 0 -JState(M) 



Layer 1 -4State(t-1) 




Layer 2 — IState(M) 



Figure 3: Factorial Hidden Markov Model 



By dividing the states into layers we form a system that can model several processes 
with independent dynamics which are loosely coupled. Each layer has similar dynam- 
ics to a basic hidden Markov model but the probability of an observation at each time 
depends upon the current state in all of the layers. In our formulation it is assumed for 
simplicity that in each layer, the state variable can take on one of K distinct values at 
each time (rather than assuming that the number of possible states within each layers is 
different). Thus we have a system that requires M KxK transition matrices. It should 
be noted that this system could still be represented as a regular HMM with a K M xK M 
transition matrix with zeros representing illegal transitions. 

For example, consider a 2-layer system with 3 states per layer. Let the transition 
matrices for layer 0 and layer 1 be Aq and A± respectively. 



a 0 b 0 c 0 
0 d 0 e 0 
0 0 1 



Ax 




The transition matrix for the equivalent basic HMM system is built by creating a Carte- 
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sian product of the two original matrices A 0 and Ai 



I a 0 ai a 0 b\ a 0 c\ b 0 ai b 0 bi b 0 c\ 

0 ao d\ do e\ 0 bo d\ b\ e\ 

0 0 a 0 0 0 b 0 

0 0 0 do a i do bi do c\ 

0 0 0 0 44 dod 

0 0 0 0 0 d 0 

0 0 0 0 0 0 

0 0 0 0 0 0 



e 0 ax 
0 
0 



c 0 ai 
0 
0 



ai 
0 



c 0 6i 
co di 
0 

e 0 6i 
e 0 di 

0 

6i 

di 



e 0 ei 
eo 



c 0 ei 



eo ci 



co Cl 



yooooooooi/ 

resulting in a transition matrix with K M = 9 states. As we can see an explosion in the 
number of states occurs. For this reason, as we note in section 2.3 it is preferable to use 
the M KxK transition matrices over the equivalent K M xK M representation simply 
on computational grounds. 

We now consider the probability of the observation given the meta-state. As men- 
tioned, this probability depends on the current state in all the layers. In our work, we 
have used two different ways of combining the information from the layers. The first 
method assumes that the observation is distributed according to a Gaussian pdf with a 
common covariance and the mean being a linear combination of the state means. This 
formulation was originally proposed by Ghahramani [Ghahramani and Jordan, 1996] 
and is shown in Equation 3. We refer to this model as a "linear" factorial HMM. 



Here ^( m l S( ) is the mean of layer m given the meta-state S t and C is the covariance. 
Other symbols are as previously defined. 

The second combination method assumes that p(Y t \S t ) is the product of the (Gaus- 
sian) distributions of each layer. We refer to this technique as the "streamed" method. 
Each layer of the FHMM models a stream of the observation vector. The idea of 
streams has already been proposed in the speech research community. Recognition en- 
gines like SPHINX [Lee et al., 1990] and HTK [Young et al., 1993] allow similar for- 
mulations in their HMM systems. The difference between our formulation and their's 
is that a "streamed" FHMM allows more decoupling in the streams' dynamics. 

The equation for the observation probabilities in our streamed case is 



Here the matrix M m partitions the observation vector into streams. For example in a 
two-layer system we have 




(3) 




(4) 



M 0 
Mi 



( Id I Od ) 
(0 D \I D ) 
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Here Id is the DxD identity matrix and D is the dimensionality of each of the streams. 
We will discuss later in more detail the motivation for this alternative formulation. 

Notice that here we use a single covariance although extending this formulation 
to use a different covariance for each stream or for each state within the stream is 
straightforward. 



2.2 Estimation of Parameters 

The model parameters are the means of the states in each layer, the transition probabili- 
ties between states in each layer, the prior probabilities of each state and the covariance. 
All these parameters can be estimated using the Expectation Maximization (EM) algo- 
rithm [Dempster et al., 1977]. Due to our slightly different formulation of the acoustic 
probability, the algorithm we present here is different but equivalent to that presented 
in [Ghahramani and Jordan, 1996]. 

The basic workings of the algorithm are well known. Model parameters are initial- 
ized and then reestimated to maximize a so-called auxiliary function. The algorithm 
guarantees to increase the likelihood of the observations given the model on each iter- 
ation. Only convergence to a local maximum is guaranteed. 

We first discuss reestimation of the model parameters by maximization of the aux- 
iliary function. 

The auxiliary function to be maximized is 

<j ) (X,X')=J2Px(S\Y)lnp x ,(S,Y) (5) 
s 

In this and subsequent equations the prime denotes the reestimated or new model pa- 
rameters. 

Substituting Equations 1 and 2 into 5 we have 



4>(x,x') = 

52px(S\Y) 



T M 



t=2 m=l 



t=l 



Here P(S t (m) \S^}) is the transition probability between state S^} and s[ m) . This 
equation can be separated into components which depend only on each set of parame- 
ters to be reestimated. 



4>{X, A') = 0„(A, A') + &(A, A') + 0 C (A, A') 

Here (f> a {X, A') is the part of <f>(X, A') which depends on the prior probabilities, (f>b{X, A') 
is the part which depends only on the transition probabilities and <f> c {X, A') is the part 
which depends only on the means and covariance. 

We present here formulas for the single observation case. Extension to multiple 
observations is straightforward. 



2.2 Estimation of Parameters 



7 



2.2.1 Reestimation of the Means 

The means are reestimated by maximizing <?) C (A, A'). For linear FHMMs (means com- 
bined using Equation 3) the auxiliary function becomes (ignoring the term in <f> C (A, A') 
containing only the covariance) 



</> c (A,A') = 

E^ 5 i y )E 



t=l 



V m=l / V m=l / 



(6) 



To reestimate the «th mean of the nth layer we take the derivative of Equation 6 with 
respect to /^ n ' and set it equal to zero. This leads to the following equation 

T / M \ 

° = E E P(S t \Y,\)(Y t -Y,» imlSt) ') 

t=1 S it S™ = i \ m=1 / 

where P(S t \Y, A) is the posterior probability of meta-state S t given the observations 
and the model. 

This equation is clearly not solvable for /^ n ' . However, if the process is repeated 
for all the means of all the layers, KxM equations will be generated for the KxM 
means. These can be solved using matrix algebra, although in practice efficient matrix 
inversion techniques capable of handling ill-conditioned matrices are needed. 

If the streamed method is used to combine the means then the equations become 
somewhat more decoupled. The auxiliary function is now 



</> c (A,A') = 

E^(smE 



m—l 



(7) 



Solving for /i| n ' we have 



{n y = EU E St , s ^P(St\Y,X)M m Y t 
2.2.2 Reestimating the Covariance 

The covariance is reestimated by maximizing <p c {\ A') with respect to C. In the linear 
case, the reestimation formula is 



T / M \ / M 

^EE*w,a) y t -5><™> h-E" 

t=l S, \ m=l J \ m=l 



t 

(m) \ 
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For the streamed case, notice that the reestimation formula is very similar to the 
usual covariance reestimation formula for HMMs. 



1 T M 

T E E E p ( 5 *i y ' A )( M » y * - M (m) )(^™ y - M (m) )* 



t=l Si m=l 

If it is desired that each stream has a separate covariance, then the reestimation formula 
reduces to the usual HMM covariance reestimation formula with the observation being 
the part of the feature vector for that stream. 

2.2.3 Reestimating the Transition Probabilities 

The transition probabilities are reestimated by maximizing 06 (A, A'). We have 

T M 



&(A, A') = 5>(S|r) £ J2 lnp(5 t (m) |5^') (8) 

t=2 m=l 



Now let a be the transition probability from state i to state j in layer n. Maximizing 



Equation 8 with respect to a gives the following reestimation formula 



,(«) 

Et= 2 E Sl _ 1 s„s l ( ri^,sf m) =» TOISt-i.y) 



Et=2 E S| _ 1 S„S l ( ri^ P ^*l S *- 1 ' y ^ 



2.3 Calculation of the Posterior Probabilities 

The reestimation formulas require the calculation of P(S t \Y, A) and P(S t \S t -i, A), 
which we will refer for notational simplicity as P(St \Y) and P(St |<St-i) respectively. 

Direct computation of these using Equations 1 and 2 would require 0(2T(K M ) T ) 
calculations which is intractable. This can be reduced to 0(TK 2M ) by use of the 
so-called Forward-Backward or Baum- Welsh algorithm [Rabiner, 1989]. 

In HMMs the usual method to calculate P(S t \Y) and P(S t \S t - 1 , Y) is to define 
so-called Forward and Backward probabilities. The Forward Probability a t (j) is de- 
fined as 

a t (j) = P(Y 1 ,... ,Y t ,S t =j\X) 

That is the probability of observing the first t speech vectors and being in jth state at 
time t. Similarly the Backward Probability flt(j) is defined as 

&t(j)=P(Y t+ i,...,Y T \S t =j,\) 

These probabilities can be calculated using simple recursion and they can be combined 
togiveP(S t |Y) and P(S t \S t -i,Y). 

q t _i(») Ojj P(Y t \S t =j) 0 t (j) 



p(s t =;|s t _ 1= *,Y) = a -^f^w 
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Unfortunately, in the factorial HMM case, the state St is actually a meta-state. 
Therefore, to calculate the a t and /3 t terms we would have to perform recursion over all 
the layers as well as all time. Ghahramani [Ghahramani and Jordan, 1996] presents a 
modified version of the Baum-Welch algorithm which does not depend on a K M xK M 
transition matrix. Making use of the fact that each layer has independent dynamics, the 
calculations can be reduced to 0(TMK M+1 ). This is tractable for small K and M. 
We present here Ghahramani's method with the equations to calculate at in slightly 
more detail. The equations for @ t follow a similar pattern and are not presented here. 

To calculate a t we use the following recursion in space, i.e. for every time instant 
we perform a recursion across the layers 

(9) 
(10) 

,S { t z \Y u ... ,y t _i|A) (11) 
(12) 

Here the indices of a t (i, j, . . . ,z) refer to the states in each layer. That is, the state at 
time t in layer 0 takes value i, the state in layer 1 value j and so on. To clarify these 
formulas, we briefly study the two-layer three-state case. 

We initialize a\ (i, j) using the prior probabilities of states i and j. 

a 1 (i,j)=u\ 0) U^ Vi,je{0,l,2} 
Using Equation 12 we have 

«2(«,i) (2) = on(i, j) 
We now calculate a^ (i, j) and a^ for all i and j using Equations 10 and 11. 

S t-1 

Having calculated a£\i,j) we use Equation 9 to calculate 02(2, j), completing the 
recursion. 

3 Experimental results 

Our experiments tested a factorial HMM system on a phoneme classification task. We 
used the phonetically balanced TIMIT database [Fisher et al., 1986]. Training was per- 
formed on the "sx" and "si" training sentences. These create a training set with 3696 



<*t{i,j,--- ,z) =p\{Y t \St)a\ 
<4 m -%,j,... ,z)=Y,P(Si m) \S^l\)a{ 

a< t n \ l ,j,...,z) = P(S^ 1 ,S^ 1 ,---,S^ 

(M) ,. . x 

ai = a t _i(«,j,... ,z) 
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3 EXPERIMENTAL RESULTS 



Model 


% Error 


Baseline HMM 
Linear FHMM 


42.9 
71.3 



Table 1: Classification Results - Linear FHMM vs HMM 



utterances from 168 different speakers. 250 sentences from the test set were used for 
testing. The factorial HMM had 2 layers and 3 states in each layer. The standard Lee 
phonetic clustering [Lee and Hon, 1989] was used resulting in 48 phoneme models 
with these being further clustered during scoring to 39 models. 

A baseline system was also implemented. This was a 3-state left-to-right HMM 
system. Mixtures of Gaussians were used to model the posterior probabilities of the 
observation given the state. 8 mixture components were used per state. 

We used cepstral and delta-cepstral features derived from 25.6ms long window 
frames. The dimension of the feature vector was 24 (12 cepstral and 12 delta cepstral 
features). 

3.1 Linear Factorial HMMs 

The first experiment investigated the performance of the linear factorial HMM. The 
results are shown in Table 1. For this experiment, the means and covariance were 
initialized using the mean and covariance of the pooled training data. 

These results demonstrate that the linear factorial HMM models speech poorly. 
A major problem here is that there are not enough system parameters to form a good 
model. The only way to introduce more system parameters would be to add more layers 
and/or states because there is no obvious way to incorporate mixtures of Gaussians into 
the linear FHMM framework. 

We therefore turn our attention to the streamed FHMM. 

3.2 Streamed Factorial HMMs 

The reestimation formulas for streamed FHMMs can be easily extended to the multiple 
Gaussian mixture case. It also seems a more natural fit to speech feature vectors nor- 
mally composed of several streams of sub-vectors. For example a typical feature vector 
may consist of the cepstrum, delta cepstrum, second delta cepstrum, and sometimes 
even energy and its derivatives. If these different "streams" have somewhat decoupled 
dynamics, we hypothesize a factorial HMM could be a logical alternative to HMMs. 
Each distinct sub-vector stream could be modeled by each of the layers in the FHMM. 

In our experiments the parameters for each stream were initialized using regular 
HMMs trained on the features of the corresponding stream. Table 2 shows the results 
when one layer models the cepstrum and the other models the delta cepstrum. For 
completeness, the error rates of the HMMs trained on the cepstrum and delta cepstrum 
only are also shown. 8 mixture components per state were used in both HMMs and 
FHMMs. 



3.3 Sub-band-based Speech Classification 
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Model 


Feature Vector 


% Error 


Baseline HMM 


Cepstrum + Delta Cepstrum 


42.9 


Baseline HMM 


Cepstrum 


51.6 


Baseline HMM 


Delta Cepstrum 


62.3 


Streamed FHMM 


Cepstrum + Delta Cepstrum 


46.3 



Table 2: Classification Results - Streamed FHMM vs HMM 



Feature 
sub-groups 



Speech 



Classifier 



Classifier 



Classifier 



Result 



Figure 4: Sub-band Model 



We can see that while the streamed FHMM produces reasonable results it is not 
able to improve upon the basic HMM model. 

A reason for this may be that there is only an advantage in using the FHMM if the 
layers model processes with different dynamics. The cepstrum and delta cepstrum are 
highly correlated hence it is to be expected that they would have similar dynamics. 

We therefore tried feature vectors that we expected to be somewhat more decorre- 
lated. It was hoped that perhaps the modeling assumptions of FHMMs might be more 
adequate and provide an edge over traditional HMMs. 



3.3 Sub-band-based Speech Classification 

Recently, researchers such as [Bourlard and Dupont, 1996], [Hermansky et al., 1996] 
and [Bourlard and Dupont, 1997], have considered modeling partial frequency bands 
by separate HMMs and combining the probabilities from these at a suitable level (e.g. 
the phoneme level). The idea has its roots in models of human auditory perception. 
Figure 4 shows the sub-band model. 

Examining this figure we can see there is clearly a great deal of scope for research 
when chosing the number of feature sub-groups and the merging technique. We do not 
consider these issues in our work. We have implemented a simple two-band version of 
the sub-band model using addition of the acoustic log likelihood at the phoneme level 
as the merging technique. We call this system a "parallel" HMM. 
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4 DISCUSSION 



Model 


Feature Vector 


% Error 


Baseline HMM 


Upper + Lower band 


46.9 


Baseline HMM 


Upper band 


66.7 


Baseline HMM 


Lower band 


59.5 


Parallel HMM 


Upper + Lower band 


45.6 


Streamed FHMM 


Upper + Lower band 


48.3 



Table 3: Classification Results - Streamed FHMM 



The feature vectors for this system were derived as follows. A traditional mel-based 
log spectrum vector with 40 components was generated. The log spectrum was divided 
in two streams, the first one containing the lower 20 components and the second one 
containing the the upper 20 vector components. Each of the sub- vectors was rotated 
by a DCT matrix of dimension 20x12 generating 2 cepstral vectors each of dimension 
12. Each of these streams of vectors was then mean normalized. Delta features for the 
resulting two streams were produced and appended to them. 

Table 3 shows the results for experiments using the banded feature vectors. We 
present results for tests using the baseline HMMs, FHMMs, parallel HMMs and also 
for HMMs trained on only the lower or upper band and their delta coefficients. 

The factorial HMM was initialized as follows. Each of the layers was trained first 
using traditional HMM techniques. These HMMs were the initial models used by the 
FHMM training algorithm. 

Again we can see that there is no advantage in using the FHMM model. 

4 Discussion 

Further work is needed to conclude if factorial HMMs are a good alternative to HMMs. 
Since the major advantage offered by these models appears to be their ability to model 
a process which is composed of independently evolving sub-processes, the choice of 
features is critical. If the features are indeed highly correlated factorial HMMs do not 
seem to offer compelling advantages. This fact is noted by Brand [Brand, 1997] who 
states that "conventional HMMs excel for processes that evolve in lockstep; FHMMs 
are meant for processes that evolve independently". 

We postulate however along similar lines as [Hermansky et al., 1996] that there 
could be some advantage in using the FHMM framework to model speech and noise if 
these were uncorrelated. Alternatively if sub-band features were used the FHMM could 
provide more robust recognition in the case of corruption in one sub-band. Further 
work is needed in this area. 

The most interesting research direction however would be to investigate the combi- 
nation of traditional speech features with other information such as articulator positions 
or language models or lip tracking information. The FHMM framework provides an 
interesting alternative to combining several features without the need to collapse them 
into a single augmented feature vector. 

It is important to notice that alternative formulations combining the information 
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from each of the states in the meta-state are possible. In this report we have described 
the linear FHMM and the streamed FHMM. Perhaps other alternatives can be explored. 

We believe, therefore, that further research is needed to decide if algorithmic exten- 
sions to HMMs such as factorial HMMs or coupled HMMs offer a good alternative to 
traditional HMM techniques. The work in this report only represents a very first effort 
in this direction. 



5 Conclusions 

We have presented factorial HMMs as possible extensions of hidden Markov models. 
These models were investigated in the context of phoneme classification as a possible 
replacement for traditional HMMs. We have also introduced and explored the concept 
of streamed factorial HMMs. Our experimental results proved inconclusive. In the ex- 
periments presented in this report, factorial HMMs did not appear to offer any advan- 
tage over regular HMMs when traditional feature vectors were used. We postulate that 
this is because any modeling advantage offered by factorial HMMs will only become 
evident if less correlated features are used. We conclude the report with suggestions 
for future work. 
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